4 - Seminar Meta Learning (SemMeL) - Jonas Utz - Prototypical Networks for Few-shot Learning [ID:24667]

50 von 253 angezeigt

So welcome everybody. Today Jonas Ozt will present a paper by Jake Snell and colleagues

and it's entitled prototypical networks for Q-shot learning. Jonas, the stage is yours.

Alright, thank you for the kind introduction and let's start. So this talk is divided into

three major parts. At first we will have a gentle introduction into Q-shot learning with

the most important terminology, but the main part of the talk is of course the concept

of prototypical networks. Among the mathematical derivations and the core algorithms we will

see different interpretations of prototypical networks and hear about major design choices

as well as zero-shot learning. The third part of this talk will be about the performed experiments

on three major datasets for meta-learning. The results are also compared with state-of-the-art

meta-learning algorithms. Alright, so let's start with an introduction to Q-shot learning.

Let's have a look at the most important terminology of Q-shot learning. Some of the terms were

already mentioned in last week's talk by Aka and Benjamin, but I think it's good to revise

them. So what is Q-shot classification? In Q-shot classification, the classifier needs

to classify classes at this test time which were not seen at training time. To do so,

few samples of these unseen classes are provided. We'll hear more about these classes in a second.

When we talk about Q-shot classification, we usually specify how many classes we want

to classify and how many support samples we provide for these classes. The number of classes

is called way and the number of samples is called shot. So a usual terminology would

be k-way and shot classification. There are two special cases which usually gain attention.

This would be the one-shot classification where we have only one support sample and

the case where we have no support sample is called zero-shot classification or learning.

We can do actually one-shot learning easily by ourselves. Here we see an example. I think

the one with the segway is pretty easily, but on the right you also see some characters

from and for us unknown alphabet. Given the one example in the box, you can now guess

which of these characters belong to the same class. So find the remaining characters. It's

actually in the second row, the third column, and in the fourth row, the second column.

So we're working with two sets as previously mentioned, the support set and the query set.

Our support set comes with labels and the amount of support samples is denoted as n

and is determined by the number of shots as I introduced it in the previous slide. Each

sample in the support set is represented by a d-dimensional feature vector and of course

a scalar label. The number of classes in the support set is given by the ways, again as

I mentioned in the previous slide. The second set is the query set and these are the samples

we want to classify and at test time we do not have a label for these. The classes in

the test set and the query set and the support set are the same. And since we are performing

our predictions on the query set, these are also the samples which we use for computation

of the loss during training and accuracy during testing.

Alright so now let's look about the prototypical networks and the actual model idea. To start

with we look at each of our support samples and put them into an embedding function denoted

here as f phi. This maps our d-dimensional feature vector in the n-dimensional embedding

space. Here in the embedding space we sum over all embeddings of a class and divide

by the number of samples per class and what this is, this is simply the mean of the class

which we will call prototype. Now that we have computed class prototypes from our support

set we want to use those for computing class probabilities for our query samples. For that

we need a distance function like for example the square Euclidean distance. Now we put

the sample, the query sample also into the embedding function to bring it into the embedding

space. Here we simply evaluate the distance to all of the class prototypes and now we

want to obtain probabilities and we can do that with a very well-known formula. At least

I hope you know it and this is this formula, the softmax function. So we simply implement

our negative distance into the softmax function and obtain the probability for our classes.

We are training this by minimizing the negative log likelihood for our true class as it is

Teil einer Videoserie :

Seminar Meta Learning (SemMeL)

Presenters

Prof. Dr. Andreas Maier

Zugänglich über

Offener Zugang

Dauer

00:33:52 Min

Aufnahmedatum

2020-11-23

Hochgeladen am

2020-11-23 14:50:15

Sprache

en-US

Today Jonas Utz presents the paper "Prototypical Networks for Few-shot Learning"

We propose Prototypical Networks for the problem of few-shot classification, where a classifier must generalize to new classes not seen in the training set, given only a small number of examples of each new class. Prototypical Networks learn a metric space in which classification can be performed by computing distances to prototype representations of each class. Compared to recent approaches for few-shot learning, they reflect a simpler inductive bias that is beneficial in this limited-data regime, and achieve excellent results. We provide an analysis showing that some simple design decisions can yield substantial improvements over recent approaches involving complicated architectural choices and meta-learning. We further extend Prototypical Networks to zero-shot learning and achieve state-of-the-art results on the CU-Birds dataset.

https://arxiv.org/abs/1703.05175

Tags

Per RSS abonnieren